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Abstract 

The purpose of this study was to evaluate the psychometric properties of a newly developed instrument intended 
to measure faculty competence as it pertains to their role as advisors, particularly in medical and professional 
programs. A total of 166 students completed the Faculty Advisor’s Skills and Behaviors Inventory (FASBI). The 
psychometric properties of the FASBI were evaluated using the Rasch Rating Scale Model. Results indicate the 
FASBI is a psychometrically- sound instrument capable of producing valid and reproducible measures. 

Keywords: advising, faculty, psychometrics, measurement, instrument, medical education, higher education 

1. Introduction 

1.1 Introduction of Problem 

Student advising plays a critical role in student development (Light, 2001; Pizzolato, 2008; Reinarz & Ehrlich, 
2002) and good advising has continually been linked to students’ satisfaction with an institution (Baker & 
Griffin, 2010; Elliott & Flealy, 2001; Freeman, 2008), and academic success (Campbell & Nutt, 2008; Museus & 
Ravello, 2010). Beggs, Bantham, and Taylor (2008) reported that college and university advising often 
influences students’ career decisions and direction as well. Although colleges and universities often indicate a 
commitment to advising, extant research suggests advising is often “uneven in quality and ultimately ineffective” 
(Hossler, Ziskin, & Gross, 2009, p. 8). To this end, research by Flossier and colleagues (2009) and Kramer (2003) 
note that most colleges and universities in the United States do not assess advising. This is particularly 
unfortunate because the failure to evaluate the quality of advising at one’s institution conveys a message to the 
faculty that advising is of low priority or is undervalued. 

1.2 Purpose of the Present Study 

Programmatic assessments are an important tool to actively evaluate successes, shortcoming, and outright 
failures of a program. This is essential for ongoing programmatic improvement. Given faculty advising is a topic 
of critical importance for students and their education, it is necessary that faculty advisors routinely be evaluated. 
Objective assessments of faculty strengths and weaknesses can inform the development of training programs or 
other interventions designed to help faculty in this vitally important role. Thus, the purpose of this study was to 
evaluate the psychometric properties of a newly developed instrument intended to measure faculty competence 
as it pertains to their role as advisors, particularly in medical and professional programs (e.g., veterinary 
medicine, pharmacy, dentistry, physical therapy, etc.). 

2. Methods 

2.1 Participants 

Survey participants involved students at the North Carolina State University College of Veterinary Medicine. A 
census sampling of all students enrolled across each of the four years in a Doctor of Veterinary Medicine (DVM) 
program were invited to participate in the study. Of the 394 students invited, 167 provided valid responses for 
the 25 items evaluating the skills and behaviors of faculty advisors. Students’ ages ranged from 21 to 49 years 


195 




www.ccsenet.org/jedp 


Journal of Educational and Developmental Psychology 


Vol. 6, No. 1; 2016 


old, with the average student being 26 (SD = 3.9). The median age for the sample was 25. A complete 
breakdown of student demographic characteristics are presented in Table 1 . 


Table 1. Demographic characteristics of sample 



N 

% 

Sex 

Male 

22 

13.2 

Female 

145 

86.8 

Race 

White 

124 

74.3 

Black 

11 

6.6 

Other 

32 

19.2 

Ethnicity 

Hispanic/Latino(a) 

22 

13.2 

DVM Year 

Class of 20 16 

41 

24.6 

Class of 20 17 

39 

23.4 

Class of 20 18 

39 

23.4 

Class of 20 19 

48 

28.7 


2.2 Instrumentation 

A number of instruments are available for assessing faculty advising. However, most instruments focus on 
elements that are typically less relevant to faculty in medical and health-related professional programs. For 
example, most instruments assessing advising ask questions that focus on the extent to which faculty were able 
to help students identify a major and select appropriate/relevant courses to ensure degree requirements are 
fulfdled and students graduate in a timely matter. For medical and health-related programs, students have already 
selected their major, and typically navigate the curriculum in a cohort manner. Thus, there is little need to assess 
these facets. Similarly, instruments that focus on advising as it pertains to graduate and doctoral students (e.g., 
the Graduate Advising Survey for Doctoral Students) tends to focus on many of the same issues, but also 
perceptions of the department’s climate, peer influence, and the advisor’s role as someone training researchers. 
Given medical and health-related programs have somewhat different needs, we developed a new survey based on 
elements we believed to be particularly relevant for these types of programs (see Zimmerman & Mokma, 2004; 
Barnes et ah, 2011; Belcheir, 2000). This resulted in the development of the Faculty Advisor’s Skills and 
Behaviors Inventory (FASBI). 

2.3 Validation Framework 

Much has been written in the psychometric literature about the limitations of traditional statistical approaches for 
validation studies. Royal (2010) lists six major weaknesses of traditional approaches, which include: 1) problems 
associated with erroneously treating ordinal ratings as interval level measures, 2) erroneously perceiving all 
items of equal importance, 3) erroneously assuming error is equal across all measures, 4) sample-dependency 
problems, 5) parametric statistical approaches require normally distributed data, and 6) how missing data can 
seriously threaten score validity. Many measurement scholars have declared Rasch models the “gold standard” 
for psychometric validation studies because they overcome the limitations of traditional statistical models and 
are the only measurement models that have the property of invariance (Royal, 2010; Salzberger, 2002; Wright, 
1997). 

Rasch models are particularly attractive for analyzing survey research data as they are able to separate person 
measures (e.g., knowledge, ability, skills, etc.) and item data (e.g., difficulty of item, difficulty of task, etc.) and 
explore how these two facets interact with one another. In the present study, the latent trait being measured is 
students’ tendency to endorse (agree with) a variety of survey items. Rasch models produce linear measures 
(called “logits”) and create a common linear continuum onto which both person and item measures are mapped. 
Because Rasch models are probabilistic models, the likelihood of a student endorsing an item can be modeled as 
a logistic function of the distance between a student and a survey item. Readers interested in learning more about 
Rasch measurement models are encouraged to see Engelhard (2014) and Bond and Fox (2007). 
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For the present study we opted to use the Rasch Rating Scale Model (RRSM) (Andrich, 1978) for the analysis of 
survey data as it is particularly well-suited for polytomous data. According to the RRSM model, the probability 
of a person n responding in category x to item i, is given by: 

exp^[yS„-(^,+ry)] 

P„,= x = 0J,...,m 

xm „ 

k=0 j=0 


0 

where Xo ^ 0 so that exp Pn is the person’s position on the variable, 5iis the scale 

y=o 

value (difficulty to endorse) estimated for each item i and Xi, T 2 , ... , x^are the m response thresholds estimated 
for the m + \ rating categories. Winsteps measurement software (Linacre, 2016) was used to perform the data 
analysis. Parameters were estimated using joint maximum likelihood estimation procedures (Wright & Masters, 
1982). 

3. Results 

Before the results of the psychometric validation are presented, it is necessary to first explore summative 
descriptive statistics. Table 2 presents a summary of mean and Standard Deviation (SD) values for each of the 25 
items. 


Table 2. Results of traditional statistical analysis 


Item 

Mean 

SD 

1 . Is proactive in reaching out to meet with me 

2.48 

1.13 

2. Is easy to get in touch with 

3.25 

0.84 

3. Responds to my emails/calls in a timely manner 

3.36 

0.78 

4. Gives me as much time as I need when we meet 

3.44 

0.71 

5. Encourages me to seek him/her for help 

3.14 

0.95 

6. Takes a personal interest in me 

3.13 

0.94 

7. Is a good listener 

3.45 

0.69 

8. Respects my opinions and feelings 

3.54 

0.56 

9. Is knowledgeable about academic policies and procedures in the college 

3.44 

0.63 

10. Provides me with accurate information and answers to my questions 

3.48 

0.59 

1 1 . Refers me to other resources for assistance, when necessary 

3.51 

0.62 

12. Encourages me to achieve my educational goals 

3.51 

0.63 

13. Flelps me identify the obstacles I need to overcome to reach my academic and/or 

3.21 

0.79 

professional goals 



14. Makes his or her self available to meet with me 

3.27 

0.87 

15. Flelps me to examine my needs, interests, and values 

3.24 

0.81 

16. Is familiar with my academic background 

3.17 

0.85 

17. Flelps me explore careers in my area of interest 

3.14 

0.93 

18. Is approachable and easy to talk to 

3.39 

0.75 

19. Encourages me to ask questions and discuss any concerns 

3.35 

0.74 

20. Shows concern for my personal growth and development 

3.26 

0.81 
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21. Encourages me to discuss personal issues 

2.82 

1.01 

22. Idas a lot in common with me 

2.88 

0.78 

23. Is helpful 

3.30 

0.75 

24. Is someone I consider a mentor 

3.02 

1.00 

25. Is someone I would recommend as an advisor to other students 

3.12 

0.97 


3.1 Dimensionality 

Virtually all survey data sets consist of multiple dimensions. The question, however, is to what degree are 
various dimensions present, and is there evidence of a single, primary underlying construct being measured. To 
answer this question, a Rasch-based Principal Components Analysis (PCA) of standardized residual correlations 
was performed to assess dimensionality. In total, 64.4% of the Rasch dimension was explained, with 20.8% 
being attributed to the items. The largest secondary dimension explained 4.4% of the variance and had an 
Eigenvalue of 3.0, indicating a strength of about 3 items. The ratio of the variance explained by the items (20.8%) 
and the largest secondary dimension (4.4%) is about 5:1. Thus, for all practical purposes the data were 
sufficiently unidimensional for a Rasch measurement analysis. 

3.2 Reliability 

Reliability was assessed using multiple methods. First, the traditional measure of reliability (Cronbach’s alpha) 
was calculated, then reliability measures were calculated from the Rasch measurement framework using “real” 
and “modeled” measures. Cronbach’s a reliability was .972 for the 25 items. Rasch-based reliability estimates 
were .94 for “real”, and “.95” for modeled, suggesting true reliability is likely somewhere in between. All three 
measures of reliability indicate highly reproducible measures (Royal & Flecker, 2015). Separation, which refers 
to the number of statistically distinguishable levels within the data, was also assessed. The separation statistic 
was 4.27, indicating approximately four statistically distinguishable levels were present in the data. 

3.3 Rating Scale Effectiveness 

Rating scale diagnostics were assessed to determine the extent to which students were able to appropriately 
interpret and make use of each rating scale category (see Table 3). Results indicate students made use of the full 
rating scale, although fewer students tended to provide ratings of disagreement. Infit and outfit mean square fit 
statistics were in appropriate range, the category measures and stmcture calibration measures each advanced in a 
stepwise manner as anticipated (Linacre, 2002). Collective results provide evidence the rating scale functioned 
very well for this instrument. 


Table 3. Rating scale diagnostics 


Rating Category 

n 

% 

INFIT 

MnSq 

OUTFIT 

MnSq 

Structure 

Calibration 

Category 

Measure 

( 1 ) Strongly Disagree 

186 

5 

1.00 

1.06 

None 

-3.50 

(2) Disagree 

407 

12 

1.13 

1.22 

-2.27 

-1.49 

(3) Agree 

1322 

38 

.95 

.87 

-.67 

1.17 

(4) Strongly Agree 

1603 

46 

.97 

.94 

2.95 

4.07 


3.4 Item and Person Measure Quality 

A variety of statistical indicators were used to assess item and person measure quality (see Table 4). First, the 
variability of item difficulty measures were investigated. Results indicate logit measures ranged between -1.21 to 
2.38 with an average standard error of .17 (SD = .01) in magnitude. These values indicate adequate variability 
with sufficiently small standard errors to ensure statistically stable measures. Infit and outfit mean square fit 
statistics were evaluated using the recommendations provided by Wright and Linacre (1994), noting ideal values 
should range between .60-1.40, and values exceeding 2.00 may be indicative of a “noisy” (potentially 
problematic) item. Of the 25 items, 23 fell within ideal range for fit statistic values. Item #23 -[My advisor] is 
helpful yielded fit statistics slightly below .60 indicating these responses were a bit predictable. Item #l-[My 
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advisor] is proactive in reaching out to meet with me yielded inflated fit statistics (1.83 and 2.26, respectively), 
thus indicating a potentially problematic item. Point-measure correlations were all (high) positive values 
indicating excellent discriminatory abilities (Linacre, 2015a). 

Similar procedures used to assess person measure quality. Person measures ranged from -4.01 to 6.52 with an 
average standard error of .52 (SD = .22) in magnitude. These values also indicate excellent variability with 
sufficiently small standard errors to ensure stable measurements. Only 9 students (5.4%) from the sample had at 
least one fit statistic exceeding 2.0 indicating possible misfit. 

Table 4. Item quality indicators 


Item 

Difficulty 

Measure 

Error 

INFIT 

Mean Square 

OUTFIT Mean 
Square 

Point Measure 
Correlation 

Qi 

2.38 

.14 

1.83 

2.26 

.75 

Q2 

-.17 

.17 

1.48 

1.48 

.71 

Q3 

-.38 

.18 

1.67 

1.72 

.64 

Q4 

-.56 

.19 

1.31 

1.52 

.65 

Q5 

.06 

.17 

1.11 

.98 

.81 

Q6 

.23 

.16 

.84 

.72 

.85 

Q7 

-.71 

.19 

.87 

.80 

.74 

Q8 

-1.21 

.20 

.61 

.44 

.75 

Q9 

-.61 

.19 

1.29 

1.34 

.62 

QIO 

-.86 

.19 

.67 

.61 

.76 

Qll 

-.86 

.20 

.64 

.52 

.76 

Q12 

-1.13 

.20 

.63 

.65 

.77 

Q13 

.33 

.18 

1.07 

1.07 

.76 

Q14 

-.21 

.18 

.87 

.73 

.81 

Q15 

.28 

.18 

.78 

.69 

.81 

Q16 

.37 

.18 

1.05 

1.11 

.77 

Q17 

.57 

.17 

.99 

.90 

.80 

Q18 

-.73 

.18 

.83 

.71 

.78 

Q19 

-.37 

.19 

.80 

.72 

.79 

Q20 

-.11 

.18 

.67 

.64 

.84 

Q21 

1.52 

.17 

1.23 

1.34 

.79 

Q22 

1.52 

.16 

.99 

1.23 

.77 

Q23 

-.24 

.18 

.52 

.46 

.85 

Q24 

.57 

.17 

.77 

.74 

.86 

Q25 

.33 

.17 

.78 

.69 

.85 


Two items appeared to violate assumptions for Local Item Dependence (LID). Specifically, item #2 — Is easy to 
get in touch with, and item #3 — Responds to my emails/calls in a timely manner, shared a standardized residual 
correlation of .70 indicating high statistical dependence. This indicates a student’s response to one of these items 
will likely correlate highly with their response to the other item. 

Differential Item Function (DIF) analyses were performed to assess if the construct remained invariant across 
relevant subgroups, particularly class year and gender. The iterative-logit (Rasch-Welch) method presented in 
Linacre (2015b) was performed. Because multiple comparisons were made across 25 items, a Bonferroni 
correction was necessary to control for compounding error. This resulted in the p-value normally set at 0.05 
being reduced to 0.002 in order to detect statistically significant differences. Results indicate no statistically 
significant differences across items per subgroup. 

3.5 Construct Hierarchy 

In psychometrics, the manner in which items are ordered along a linear continuum is considered a construct. In 
essence, most substantive results emanating from a Rasch measurement analysis can be found in the construct 
map (also known as a Wright Map). In short, a construct map presents a visual snapshot of the “psychometric 
ruler” onto which person and items measures have been placed (see Figure 1). In the present study, students 
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appear on the left side of the map, with those near the top representing individuals that had the least difficulty 
endorsing the items and those appearing at the bottom having the greatest difficulty endorsing the items. 
Likewise, items appear on the right side of the map, with those most difficult to endorse at the top and those 
easiest to endorse at the bottom. Here, students had the most difficulty endorsing item Q1 — [My advisor] is 
proactive in reaching out to meet with me, and the least difficulty endorsing items Q12 — [My advisor] 
encourages me to achieve my educational goals, and Q8 — [My advisor] respects my opinions and feelings. 


MEASURE Person - MAP - R*»poni* 
<aor*>i<r4rtt> 

7 XXXXXXXXXXXX T+ 


€ 


xxxxxxx 

XX 

X 

XX 


+ 


X 

xxxxx 

xxxx 
xxxxx I 
xxxx s 
xxxx 

XXX • 

xxxxxxxx 

XXX 

xxxx 

X 

XX 

XXXX 

xxxxxxxx 



X 


01 





XXXXXX 






2 

XXX 

M+ 






X 







X 

fT 

021 

022 




xxxx 







xxxxxxx 






1 

XXXXXX 

♦ 






xxxx 

IS 






xxxxxxxxx 


017 

024 




xxxxx 


013 

01€ 

02 S 



XX 


ois 

0€ 



0 

XX 

4M 

05 





XXX 


014 

02 

020 

023 


xxxxx 

SI 

019 

03 




XX 


04 

09 




XX 

IS 

010 

oil 

018 

07 

-1 

XXX 

+ 






XXX 


012 

08 




XX 







xxxx 

IT 






X 






-2 

XX 

+ 






More Difficult 
to Endorse 



X 


X 

X 

X 


Less Difficult 
to Endorse 


-4 X + 


-s xxxx 

<l«ti>l<fr4qu«nc> 


Figure 1. Construct hierarchy 


4. Discussion 

4. 1 Psychometric Properties of the FASBI 

Samuel Messick’s (1989) framework for construct validity provides a useful guide for interpreting validity 
evidence. According to Messick, validity is the integration of any evidence that impacts the interpretation or 
meaning of a score. Messick’s framework consists of six unique “aspects” of validity: substantive, content, 
generalizability, stmctural, external and consequential. It is from this framework that we appraise validity 
evidence present in the aforementioned results. 
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First, a Rasch-based PCA of standardized residual eorrelations indieated the data were primarily unidimensional, 
indieating a single, primary latent trait was being measured. This evidenee speaks to the substantive aspeet of 
validity. Measures of reliability eonsistently exeeeded .90 indieating highly reproduetive measures. This 
evidenee speaks to the generalizability aspeet of validity. An evaluation of rating seale diagnosties indieated the 
rating seale funetioned appropriately. This speaks to the stmetural aspeet of validity. An evaluation of item and 
person measures eonfirmed the measures were psyehometrieally-sound. This speaks to the eontent aspeet of 
validity. Additional validity evidenee was diseemible by way of DIF analyses that eonfirmed measures were 
invariant aeross elass year and sex subpopulations. This speaks to the systematie aspeet of validity and provides 
additional support for the generalizability aspeet of validity. Results of this study have not been eorrelated with 
other studies, thus we present no evidenee that speaks to the external aspeet of validity. Finally, beeause the 
FASBI has not be used previously we eannot speak to the eonsequential aspeet of validity whieh involves 
eonsequenees (positive or negative) resulting from the use of the instrument (Royal & Puffer, 2014). In sum, 
there is an abundanee of validity evidenee to support the psyehometrie quality of the FASBI and its ability to 
produee high-quality measures. 

4.2 Implications 

The purpose of this study was to evaluate the psyehometrie properties of a newly developed instrument intended 
to measure faeulty skills and behaviors as it pertains to their role as advisors, partieularly in medieal and 
professional programs (e.g., veterinary medieine, pharmaey, dentistry, physieal therapy, ete.). Results of a 
thorough investigation of the FASBI’s psyehometrie properties indieates the instrument is 
psyehometrieally-sound. Therefore, persons interested in evaluating faeulty advising in medieal, health, and 
various professional programs are espeeially eneouraged to use this instrument. Of eourse, the FASBI may be 
relevant and appropriate for persons interested in evaluating advising in other diseiplines as well. 

The FASBI may be partieularly helpful for evaluators as it addresses a wide -variety of topies of eoneem for most 
aeademie programs. Flaving insights about how students feel with respeet to eaeh of the items may be 
partieularly informative with regard to identify attributes of more and less sueeessful faeulty advisors. 
Furthermore, identifying these strengths and weaknesses would be partieularly helpful for preparing faeulty 
training to beeome more effeetive advisors. 

Finally, another implieation of this study is methodologieal in nature. The teehniques presented in this study 
involve what many measurement experts eonsider to be “gold standard” methods for eondueting survey 
validation studies. Further, the use Messiek’s framework for evaluating and organizing validity evidenee may 
help other researehers better identify how to present validity evidenee to others. 

4.3 Limitations 

Of eourse, this study is not without its limitations. While this study intentionally made no effort to speak to 
substantive findings as that was beyond the seope of this paper, it remains a potential limitation that the survey 
data were of a self-reported nature. The extent to whieh students may have provided soeially desirable responses, 
or that non-response bias may be an issue remains unknown. With respeet to the sample frame, sfudenfs were by 
and large female, whieh is fypieally fhe norm for veterinary medieal programs. Despite the underrepresentation 
of male respondents, eomparatively speaking, it is important to note that tests for DIF were eondueted to 
determine if students appear to respond differently to the FASBI items based on sex. Results of the DIF analysis 
eoneluded that partieipant’s sex does not affeet students’ responses to any FASBI items in any systematie way, 
t(833) = .00, p =1.000. Finally, veterinary medieine, mueh like all medieal professional programs, has signifieant 
issues with regard to diversity. Future researeh should evaluate the funetioning of FASBI items based on raee 
and ethnieity as well. 

5. Conclusion 

Effeetive faeulty advising remains a problem in many eolleges and universities, ineluding medieal and health 
professions programs. To date, relatively few institutions having aetively evaluated faeulty advising. We suspeet 
this may be in part due to a laek of appropriate instrumentation. Thus, this study sought to investigate the 
psyehometrie properties of an instrument intended to assess faeulty eompetenee as it pertains to advising. Results 
of the psyehometrie validation study provide eonsiderable amount of validity evidenee to support the quality of 
the FASBI. We eneourage others to adopt the FABSI for similar evaluations of faeulty advising. 
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